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Abstract 

Recently some papers, such as Aban, Meerschaert and Panorska (2006), 
Nuyts (2010) and Clark (2013), have drawn attention to possible trun¬ 
cation in Pareto tail modelling. Sometimes natural upper bounds exist 
that truncate the probability tail, such as the Maximum Possible Loss 
in insurance treaties. At other instances ultimately at the largest data, 
deviations from a Pareto tail behaviour become apparent. This matter 
is especially important when extrapolation outside the sample is re¬ 
quired. Given that in practice one does not always know whether the 
distribution is truncated or not, we consider estimators for extreme 
quantiles both under truncated and non-truncated Pareto-type distri¬ 
butions. Hereby we make use of the estimator of the tail index for the 
truncated Pareto distribution first proposed in Aban et al (2006). 

We also propose a truncated Pareto QQ-plot and a formal test for 
truncation in order to help deciding between a truncated and a non- 
truncated case. In this way we enlarge the possibilities of extreme 
value modelling using Pareto tails, offering an alternative scenario by 
adding a truncation point T that is large with respect to the available 
data. In the mathematical modelling we hence let T ^ oo at different 
speeds compared to the limiting fraction (A;/n ^ 0) of data used in 
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the extreme value estimation. This work is motivated using practical 
examples from different fields of applications, simulation results, and 
some asymptotic results. 

Keywords: Pareto-type distributions, truncation, extreme quantiles, end¬ 
point, QQ-plots. 


1 Introduction 


Considering positive data, the Pareto (Pa) distribution is a simple and very 
popular model with power law probability tail. Using the notation from 
Aban et al. (2006), the right tail function (RTF) 

P(1U > w) = for w > r > 0 and a > 0 (1) 


is considered as the standard example in the max domain of attraction of the 
Frechet distribution. For instance, losses in property and casualty insurance 
often have a heavy right tail behaviour making it appropriate for including 
large events in applications such as excess-of-loss pricing and enterprise risk 
management (ERM). There might be some practical problems with the use 
of the Pa distribution and its generalization to the Pa-type model, because 
some probability mass can still be assigned to loss amounts that are unrea¬ 
sonable large or even physically impossible. In ERM this corresponds to the 
concept of maximum possible loss. Here we will consider specific data sets 
on earthquake fatalities and forest fires. Eor other applications of natural 
truncation, such as probable maximum precipitation, see Aban et al. (2006). 
These authors considered the upper-truncated Pareto distribution with RTE 


P(A >x) = Frix) 


r“(x “ — T ") 

1 - (T/r)« 


( 2 ) 


for 0 < T < X < T < oo, where t < T. Next to the RTE, for any given 
distribution, we make use of the tail quantile function U defined by U(x) = 
Q(1 — 1/x) (x > 1) where Q(1 — p) := inf{a: : F(x) > 1 — p} (0 < p < 1) 
denotes the upper quantile function corresponding to F. Henceforth, Ffi/ and 
UiY, respectively Ft and LA, denote the RTF and the tail quantile function of 
the underlying Pa distribution, respectively of the Pa distribution truncated 
at T from which the X data are observed with X =dW\W < T. Note the 
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following relations between the RTFs and the tail quantile functions: 


(3) 

(4) 

(5) 


Fr(x) 

UAy) 


Fw{x) — Fw{T) 
l-Fw{T) ’ 
(i/C't [1 + y-Dr] 


U, 


w 


FwiT) 


1 + 


1 


yDr 




where Dt = Fw{T)/FwiT) equals the odds ratio of the truncated probabil¬ 
ity mass under the untruncated Pa-type distribution IT, and Ct = l/FW(r). 


Aban et al. (2006) derived the conditional maximum likelihood estimator 
(MLE) based on the k + 1 {0 < k < n) largest order statistics representing 
only the portion of the tail where the truncated Pareto (TPa) approximation 
holds. They showed that, with < • • • < Xn-k,n < -^n-fc+i,n < • • • < 
denoting the order statistics of an independent and identically distributed 
sample of size n from X, the MLE’s obtained under this conditioning model 
are given by 


f 


Xn,n ; F 


= (n - in 


(y^n—k,n/^n,n) 


— Ija 


T 

k,n 


where solves the equation 


Hk,n 


1 . 


,T 

k,n 


^k.n 


a 


k,n 


l-R, 


^k,n 


( 6 ) 


where Hk,n = |I]j=i(log-^n-i+i,n - ^ogXn-k,n) is the Hill (1975) statistic 
and Rk^n = Xn-k,n/Xn,n- This estimator can be considered as an 

extension of Hill’s (1975) estimator to the case of a TPa distribution, while 
Hk,n was introduced as an estimator oil/a when T = oo. 


Independently, Nuyts (2010) considered truncation of Pa distributions and 
obtained an adaptation of the Hill (1975) estimator through the estimation 
of E (log IT I IT G [L,R\) for some 0 < L < R hnite, taking IT to be the 
strict Pa in ([^. Then replacing R by the truncation point T and L by some 
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appropriate threshold t = (5t(1 — {k + l)/{n + 1)), as commonly considered 
in extreme value methodology, we obtain in the spirit of Nuyts (2010) that 


E{logW - logt\W e [t,T]) 


log(f)dF(w) (y)“log(|^) 

dF{w) « 


(7) 


Estimating T by the maximum Xn^n of the sample, t by an order statistic 
Xn-k,n, and E(logVE — logt|VE G [t,r]) by Hk^n, we find (|^ again. The 
estimator proposed in Nuyts (2010) differs slightly from but can be 

shown to be asymptotically equivalent to Nuyts (2010) also considered 
trimming the estimator by deleting the r — 1 (r > 1) top data, leading to the 
generalization of 0 

1 ^ 

^(logX„ 

j=r 

1 ^ ^n—k,n/^n—r-\-l,n ) Xji —r+l,n) 

^r,k,n 1 — {Xn-k,n/ -^n-r+l,n)°'’’’'‘’’* 


The estimator provides a way to make the Hill (1975) estimator robust 
against outliers, but it will be less efficient than the estimator d^,j without 
trimming. While robustness under Pa models has received quite some atten¬ 
tion in the literature (see for instance Hubert et ai, 2013, and the references 
therein), we confine ourselves to the case r — 1. 


The solution of (|^ can be approximated using Newton-Raphson iteration 
on the equation 


to get 


dsj '-=^^--1;- i-Ri„ 


(^+1) 


a 


k,n 


a 


(0 

k.n 


H, 


k,n 


^(0 

(0 \—l ^k,n 

(0 




1-R 


k,n 

'k,n 






1 \2^k,n 

^ \^k,n) ^(0 


( 8 ) 
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where for instance Hill’s estimator can serve as an initial approximation: 

The main purpose of this paper is to consider tail estimation for truncated 
and non-truncated distributions in case the Pa behaviour starts to set in 
from an intermediate threshold t = (5t(1 — {k + l)/{n-\- 1)) on. In case of no 
truncation, this setting has been formalized mathematically by the concept 
of Pa-type distributions, dehned by 

Fw{w) = w~°‘lp{w), a > 0, (9) 

where ip is a slowly varying function at infinity, i.e. limt^.oo ^F{ty)/(-pif) = 1 
for every y > 0. Hence, under this model, with W/t denoting a peak over a 
threshold t with IP > t, 

¥{W/t > y\W > t) ^ as t ^ oo, for every y > 1. 

It is well-known that then also 

Uw{y) = y^^'^^uiy), y > i, (lo) 

where £u is again a slowly varying function. 

In extreme value statistics the parameter ^ := l/« is referred to as the ex¬ 
treme value index (EVI). The EVI ^ is the shape parameter in the generalized 
extreme value distribution 

G^{x) = exp (—(1 + , for 1 -|- > 0. 

This class of distributions is the set of the unique non-degenerate limit dis¬ 
tributions of a sequence of maximum values, linearly normalized. In case 
^ > 0 the class of distributions for which the maxima are attracted to 
corresponds to the Pa-type distributions in Q. Note that for a given T fixed 
truncated Pa models are known to exhibit an EVI ^ = — 1, see for instance 
Eigure 2.8 in Beirlant et al. (2004). 

To illustrate the practical importance of the present setting, we consider the 
data set containing fatalities due to large earthquakes as published by the 
U.S. Geological Survey on http://earthquake, usgs.gov/earthquakes/world/, 
which were also used in Clark (2013). It contains the estimated number of 
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deaths for the 124 events between 1900 and 2011 with at least 1000 deaths. 
In Figure the Pa QQ-plot (or log-log plot) 

(logX„_j+i,„,log(j/n)), j = l,...,n, (11) 

is given. It exhibits a linear or Pa pattern for a large section of the data, while 
the final end of the plot is curved. The plot of 1/ Hk,n as the classical estimates 
of a based on an untruncated Pa distribution indeed bends upwards towards 
smaller values of k. This indicates that the unbounded Pareto pattern could 
be violated in this example. On this plot the extrapolations using a Pa 
distribution ([^ respectively the TPa model (|^ (with the linear respectively 
the curved extrapolation) are plotted based on the largest 21 data points as 
it was proposed in Clark (2013) such that a^in — 0-43 and l/i72i,n = 0.90. 
In section 3 tail extrapolation based on the TPa model will indeed appear to 
be appropriate in this case. 


Earthquakes 1,000 or more deaths (1900-), n = 124 



log(X„_j+i,„)J = 1.n 


Earthquakes 1,000 or more deaths (1900-), n = 124 



Figure 1: Earthquake fatalities data set. Left: Pa QQ-plot with extrapolations anchored at log(X^_2i,n) 
based on a non-truncated Pa model in ^ (dotted line) and a truncated Pareto model in § (full line) as 
proposed in Clark (2013). Right: plot of and for k = 1,... ,n. 


As suggested in Clark (2013) a could also be taken to be zero or negative. 
For instance formally setting a = — 1 in 0 , one obtains the tail of a uniform 
type distribution. Finite tail distributions following Q with negative value 
of a show a fast rate of convergence to T for small p and when T is a big num¬ 
ber, due to the presence of Dt in this expression. In the applications we have 
in mind here, as we allow T to be large, convergence to T is slow and hence a 
positive value of a is appropriate. Closely related to this paper, Chakrabarty 
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and Samorodnitsky (2012) considered a different truncation model, whereby 
a heavy tailed random variable is truncated at a high value M —>■ oo added 
by exponentially tailed random variables R, and where the probability mass 
under W behind M is set at M + R. These authors show the consistency of 
Hill’s estimator and provide a test for this kind of truncation. They did not 
consider the estimation of extreme quantiles however. 


In the next sections we specify the truncation models and provide estima¬ 
tors for extreme quantiles and for T. In section 4 we consider the problem 
of deciding between a Pa-type case in ([^ and a TPa-type case in (12), or, 
between light and rough truncation. To this end we construct a TPa QQ- 
plot, provide a new significance test and compare it with the test proposed in 
Aban et al. (2006). In section 5 we study the finite sample behaviour of the 
proposed estimators using simulations and practical examples. The asymp¬ 
totic properties of and the extreme quantile estimators under rough and 
light truncation are discussed in a final section. Proofs will be deferred to 
the supplementary material. 


2 Rough and light truncation of Pareto-type 
distributions 

Truncation of a Pa-type distribution at a value T necessarily requires t < 
T ^ oo and 


F{X/t > y\X >t) = F{W/t >y\t<W <T) 

{yt)-HF{yt) - T-Hf{T) 

- T-^lriT) 

-g ipivt) _ (T\-<^ iF(T) 

^ ^pit) \ t / 

^ \t) ipit) 

One can now consider two cases as t,T oo: 

• Rough truncation when T/t ^ P > 1, and by Karamata’s uniform 
convergence theorem for regularly varying functions (Seneta, 1976), 

l<y<l3. (12) 
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This corresponds to situations where the deviation from the Pa be¬ 
haviour due to truncation at a high value will be visible in the data, 
and an adaptation of the classical Pa tail extrapolation methods ap¬ 
pears appropriate. 

• Light truncation when T/t ^ oo 

nX/t>y\X>t)^y-^, y>l, (13) 

and hardly any truncation is visible in the data, and the Pa-type model 
@ without truncation and the existing extreme value methods for Pa- 
type tails are appropriate. 


When T ^ oo we have from([^ that 


Uriy) 


\ Fw{T) ^ + yDr 


-1 


Uw{l/Fw{T)) 


Uw{l/Fw{T)) 


1 + 


1 


- —l/ct 


where 


Cv,T ■- 


yDr 
( FwiT) 1 + 'LdL 


C 


y,T 


-1 


{fw(T) 


(14) 


Since Uw{^/Fw{T)) — T and l/Fw{T) —>■ oo, we now obtain that if yDx is 
bounded away from 0 


(5t(1 — l/y) — Uriy) — T^ + Cy,T, 


(15) 


where ^ 1 as T —>■ oo using Karamata’s uniform convergence theorem. 
Note that = 1 in case W satisfies Q. 

In case yDr —?• 0 it follows similarly from (|^ that 

Qt{^ — !/?/) = UTiy) = (yC'T)^^“[l + yDx] ^)- ( 16 ) 

Hence, with (n -|- 1)/(A: -|- 1) playing the role of y, and choosing t = (5t(1 — 
{k + l)/(n -|- 1)) = UT{{n -\-l)/{k + 1)), the conditions of rough and light 
truncation are rephrased as follows, in accordance with (15) and (16): 
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• Rough truncation when 


T/UT{{n + l)/(fc + 1))^P>1, or 


k + 1 

{n + R)Dx 




1 , 


and 


/ 1 +k y^' 
U+ '=/«/ 


u > 1; 


• Light truncation when 


T/Uxiin + l)/{k + 1)) oo or 


(n + 1)Dt 
k + 1 


and 


Urim) 


U 


1/c 


U > 1. 


(17) 


The estimation of a and k/{nDT) (or k, (3) will now constitute important 
steps in order to arrive at estimators of extreme quantiles (5 t(1 — p) and 
T. Given the fact that our model is only defined choosing k,n,T oo, 
k/n ^ 0 jointly with the corresponding conditions for rough and light trun¬ 
cation, the underlying model depends on n and a triangular array formulation 
Xni ,..., Xnn of the observations should be used in order to emphasize the 
nature of the model. However, in statistical procedures as presented here, 
when a single sample is given, the notation Xi,..., Xn is more natural and 
will be used throughout. 


3 Estimation of extreme quantiles 


From (15) it is clear that the estimation of Dt is an intermediate step in im¬ 


portant estimation problems following the estimation of a, namely of extreme 


quantiles and of the endpoint T. From (15) 


QT(i-is) r. 1+ 

eT(i-^)y 1 + 


(n+l)DT 


1 -h (n + 1)-Dx 


fe+1 


(n+l)DT 


k + 1 


1 


{n-\-l)DT 

fc+1 


(18) 
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Motivated by (18) and estimating (5 t( 1 — (A: + l)/(n + 1))/(5 t( 1 “ 1/(^^ + 1)) 
by Rk,m we propose 


Dt DT,k,n — 


k -\-l Rk.n fc-i-l 


n + 1 


fc+i 

1 - 


(19) 


as an estimator for Dt in case of truncated and non-truncated Pa-type dis¬ 
tributions. In practice we will make use of the admissible estimator 


t)!^^ := max <{ Dt 


. 0 } 


In case Dt > 0, in order to construct estimators of T and extreme quantiles 
Qp = Qt(1 — p), as in (18) we find that 


Qr(l — p) 


1 + 


fe+1 

{n+l)DT 




D _L fe+1 
Dt + p 


( 20 ) 


Then taking logarithms on both side of (20) and estimating (5 t(1 — {k + 
l)/(n -h 1)) by Xn-k,n we find an estimator gj := „ of g^: 


= log Xn-k,n + 
Note that gj can also be rewritten as 


a 


k,n \ Dt + p 


1 


log Cm = logN:„_fc,„ + ^:^log — 

^k,n V ^ 


/1 -I- ttl^\ 

' ^ {n+l)DT \ 


^p,k,n Xn-k,n 


k + 1 \ / 1 + 


+ -X- 

Dt 


)' 


1 + ^ 

P 


(n + l)p J 

An estimator of T follows from letting p M 0, 

log ffc,n = max{logCfc,nilogl^n,n} , 


{n+l)DT \ 
fe+1 






( 21 ) 


( 22 ) 

(23) 


taking the maximum with log Xn^n in order for this endpoint estimator to be 
admissible. 
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Equation for constitutes an adaptation to the TPa case of the 

Weissman (T978) estimator 


-VC 

^p,k,n 


= x^_ 


k,n 


k + l 
(n + l)p 


H, 


k,n 


(24) 


which is valid under (§. Expression is more adapted to the case of light 
truncation or (n + l)Dr/(/c + 1) —> 0. Version (22) can be linked to the case 


of rough truncation or (A: + l)/((n + 1)Dt) n- Version (21) can be applied 
in all cases. However in section 5, in case of light truncation ^ will be 
shown to be consistent only when {npn)~^ 0. When this condition is not 

satisfied under light truncation we propose to use 


logC.,, = logV„_.,„ + ^log . (25) 

Note that such alternative expressions do not exist for the estimation of the 
endpoint T as in case Dt = 0 no finite endpoint exists. 


4 Goodness-of-fit and testing for truncated 
Pareto-type distributions 

From the preceeding sections the need for goodness-of-fit and test for trun¬ 
cated Pa distributions became apparent. Based on a chosen value DT,k*,n for 


particular k*, we propose the TPa QQ-plot to verify the validity of (15): 


logX„_j+i,„,log + j/ii)) , j = 1 ,. .. ,n. 


(26) 


Note that when T = oo ov Dt = 0 the TPa QQ-plot agrees with the classi¬ 
cal Pareto QQ-plot 0 Under ( [T^ an ultimately linear pattern should be 
observed to the right of some anchor point, i.e. at the points with indices 
j = l,...,/c for some 1 < A: < n. From this, we propose to choose the 
value of k* in practice as the value that maximizes the correlation between 


logX„_j+i„ and log (jDx^k*,n + j/nj for j = l,...,k* and k* > 10. This 
choice can be improved in future work since the covariance structure of the 
deviations of the points on the TPa QQ-plot from the reference line are nei¬ 
ther independent nor identically distributed. This issue was addressed for 
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the Pa QQ-plot in Beirlant et al. (1996) and Aban and Meerschaert (2004) 
and should be considered in the truncated case too. 


Aban et al (2006) already proposed a test for : T = oo versus 

: T < oo under the strict Pa and TPa models, rejecting at asymp¬ 
totic level q G (0,1) when 


A'n,n ^ 


nA 

-logq 


l/c 


(27) 


for some 1 < k < n and where A = r" in Q. In (27), a is estimated by the 
maximum likelihood estimator „ under based on the Hill (1975) 
estimator while 

fc... pgj 


it = - 


n 


Note that the rejection rule (27) can be rewritten as 


T ' _ U TAI^k,r 

A,k,n ■- K^k,n 


and the p-value is given by . 


> log-, 
Q 


(29) 


'k,n 


Here we also consider the problem of testing light versus rough truncation, 

i.e. 


: X satisfies (|13|) versus : X satisfies (|12|). 


r(2) . 


In the next sections we inspect the finite sample and asymptotic properties 


of the test (|27|) under and . 


We propose also a different test rejecting Hq ^ when an appropriate estimator 
of (n+l)DT/{k+1) is signiffcantly different from 0. Here we construct such an 
estimator generalizing with an average of ratios (A„_fc „)", j = 

1,... ,k, which then possesses an asymptotic normal distribution under the 
null hypothesis. Observe that under ([I^ as k ^ oo 


MRk,n — - 


1^ 


'QAi-^y 


^ j=i - ^) , 


1 ^ 1 + - 
-L ^ k- 

k ^ 1 + 


fe+1 


fc+l {n-\-l)DT 


fe+1 


{n-\-l)DT 


i + h 


k+l 
2 (n+l)DT 


1 + 


fc+1 

(n+l)DT 


12 















Estimating MR^^n by 


Ek,n{R^ 


1 

k 



Xr,- 


n—k^n 


j+l,n 


a. 


•) 


leads now to 


Lk^n (^) 


Ek^n{^) 2 

1 


(30) 


as an estimator of (n + l)DT/{k + 1), with a an appropriate estimator of 
a. Under the reciprocal of the Hill (1975) estimator is an ap¬ 

propriate estimator of a. Moreover, in the final section it will be stated 
that under some regularity assumptions on the underlying Pa-type distribu¬ 
tion, we have under for k,n ^ oo and k/n —?• 0, that \/kLk^nO-lHk,n) 
is asymptotically normal with mean 0 and variance 1/12. Moreover, it is 
then also shown that under rough truncation as k,n,T oo, k/n ^ 0 and 
nDx/k ^ 1 /k > 0 




2k —log(l-|-K) 
log(l + K) 


1 < 0 , 


so that an asymptotic test based on Lk^ni/^/Hk^n) rejects on asymptotic 
level q when 

TB,k,n '■= ^12 < —Zq (31) 

with P(A/'(0,1) > Zq) = q. The p-value is then given by 'h(Vl2 /cLfc^„(l/i7fc^„)). 
Both tests (|27|) and (ImI) will further be compared below. 


5 Practical examples and simulations 


First we retake the data set containing fatalities due to large earthquakes 
from Figure 1. In Figurej^ (mfdd/e) the estimates and DT,k,n are plotted 
against k = 1,... ,n. Here we have chosen k* = 100 as a typical value where 
both plots are horizontal in k. The TPa QQ-plot in (26) is given in Figure]^ 
{top left), using the above mentioned value k* = 100. Next, the p-values as 
a function of k, both for the Ta and Tb tests are given {top right). Finally in 
Figure (bottom) the estimates of the extreme quantile go.oi using (21) and 


the endpoint T (obtained by letting p —?• 0) are presented as a function of 
k. They are contrasted with the values obtained by the classical method of 
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moment estimates as introduced in Dekkers et al. (1989) illustrating the slow 
convergence of the classical extreme value methods in the TPa-type model 
we study here. For any real EVI, the classical moment ^-estimator is defined 
by 


fMOM _ i^Ai) 

^n,k ■— 



^n,k 1 o 



(32) 


with := \ Y!i=Q (^n-i,n/^n-fc,n), j = 1,2, which constitutes a con¬ 
sistent estimator for ^ G M. The Hill estimator is = Hk^n- 
The MOM-estimators for high quantiles and right endpoint, based on the 
moment estimator ^^k^i defined by (see de Haan and Ferreira, 2006, 

§4.3.2, for details) 


Xn-k,n + Xn-k,nMl^ l ( 1 “ 


k \ 

np 


MOM 


- 1 


MOM 


and 

rpMOM 


:= max 


f p(M) Y' I p(^) _ V — 

\ -L ) ^n,n j 5 ^ • ^n—k,n Smom • 

^n,k 


^n.k 


Xn-k,nM^Aii-i-p 


(33) 


Notice that in (34) corresponds to the admissible version of the mo¬ 
ment endpoint estimator since the latter can return values below the 

sample maximum. 


Concerning the high quantile estimation, the chosen value p = 0.01 is di¬ 
rectly related to the modest sample size here of n = 124. The quantile 
estimates gooi,fcn reveal a stable pattern on fc, in Figure(bottom left). 
While on the basis of Figure Ta is only border significant for small values 
of /c, the TPa-type model with a truncation point T around 400,000 deaths 
offers a convincing fit, and leads to a useful estimator for extreme quantiles. 

Another application can be found in statistical modelling of size distribu¬ 
tions of forest fires. Power law distributions have appeared in literature for 
modelling such sizes, while Reed and McKelvey (2002) provided evidence 
that in some circumstances this is too simple to describe such distributions 
over their full range. We consider here two data sets with sizes of wild fires 


14 







Earthquakes fatalities (1900-): TPa QQ-plot 



Earthquakes 1,000 or more deaths (1900-), n = 124 



Earthquakes 1,000 or more deaths (1900-), n = 124 



k 


Earthquakes fatalities (1900-): Tests 



k 

Earthquakes 1,000 or more deaths (1900-), n = 124 



Earthquakes 1,000 or more deaths (1900-), n = 124 



Figure 2: Earthquake fatalities data set. Top: TPa QQ-plot (left) for the earthquake fatalities data set 
using k* = 100; plot of p-values based on and Tg. Middle: plots of Pareto index (left) and odds 

ratio DT,k,n estimates (A; = 1,..., 124) (right) marking the values at A; = 100. Bottom: quantile estimates 
^^01 k n endpoint estimates Tj^^n (right) contrasted with the method of moments quantile and 

endpoint estimators, in i\33\ and ( |34| , respectively. 


from Alberta (Canada) from 1998-2013 (n = 2296) that can be fonnd on 
http: //wildfire, alberta, ca / wildfire-maps/historical-wildfire-information / spatial- 
wildfire-data, aspx, together with a data set, recently considered in Gomes et 
al. (2012) and in Brilhante et al. (2013), containing the number of hectares, 
exceeding 100 ha, burnt during wildfires recorded in Portugal from 1990 till 
2003 (n = 2627). Both tails are analyzed in Figurej^for the Alberta data set 
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and in Figure for the Portuguese data. While for the Portugal case study 
light truncation or Pareto-type behaviour cannot be rejected, the 
rough truncation model hts the Alberta wildhres data much better than the 
unbounded model. 


Alberta Wildfires: Pa QQ-plot n = 2296 



Alberta Wildfires: TPa QQ-plot n = 2296 



log(Xn.j,i,n)J = 1,...,2232 



Wildfires burnt area/ha (Alberta, 1998-2013), n = 2296 
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o wildfires burnt area/ha (Alberta, 1998-2013) 
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Figure 3: Alberta wildfires. Top: Pa QQ-plot (left); TPa QQ-plot (right) using the top fc* = 2232 data. 
Middle: plots of p-values for and Tg tests (left), and Pa index (right) estimates contrasted with 
the method of moments estimates in ( |32| >. Bottom: quantile estimates Qqqqq^ ^ ^ (left) and endpoint 

esti mate s Tk,n (right) contrasted with the method of moments quantile and endpoint estimators, in ( |33| > 
and ( |34[ ), respectively. 
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For the Alberta data set, in Figure ( top right), the TPa QQ-plot in 
(26), associated with the validity of (|l5[), has been built on the chosen 
value k* = 2232, which maximizes the correlation between logX„_j+i „ and 

log ( Dt + j/n), j = 1, ■ • • ,k, ior 10 < k < n. For the Pareto index a, high 


quantile and right endpoint estimation in Figure]^ we get conclusions similar 
to the ones of the earthquake fatalities data set. 


Wildfires burnt area/ha (Portugai, 1990-2003): Pa QQ-plot, n = 2627 



Portugal Wildfires: Tests 
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Figure 4: Portugal wildfires. Top: Pa QQ-plot. Middle: plots of p-values for Ta and Tb tests (left) , 
and odds ratio DT,k,n estimates (right). Bottom: Pa index estimates (left) and quantile estimates 
^o"ooo5 k n (rigtit) contrasted with the method of moments index and quantile estimators, in ( |32| > and ( |33| >, 
respectively. 
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The finite sample behaviour of the proposed estimators based on (|6 


and (|8|), from (|21|), and from ( |23| ) has been studied through an ex¬ 


tensive Monte Carlo simulation procedure with 1000 runs, both for truncated 
and non-truncated Pa-type distributions. Here we will only present results 
concerning Pa and Burr distributions, with truncated and non-truncated ver¬ 
sions with sample size n = 400: 

1. Non-truncated models 
(a) Pareto{a), a = 0.5, 2 

Ft{x) = Fw{x) = 1 — x > 1, a > 0, (35) 


(b) Burr{a, p), a = 2, p = —1 

Ft{x) = Fw{x) = 1 — (1 -I- a: > 0, p < 0, a > 0. (36) 


2. Truncated models 

(a) Truncated-Pareto [a, T), a = 0.5, 2 and T a high quantile from 


the corresponding Pareto model (35) 


2 — X ^ 

Ft{x) = -——, 1 < a: < T, a > 0. 


(37) 


(b) Truncated-Burr {a, p,T), a = 2, p = —1 and T a high quantile 
from the corresponding Burr model in (36) 


Ft{x) 


1 - {Ipx-P^f/P 

1 _ (1 _|_ J'-pa.y/p'' 


0 < X < T, p < 0, a > 0. 


(38) 


Note that in case 
when p —>■ oo. 


of and (38), (,u{y) 


l + yp{ap) ^(l-Fo(l)) 


For a particular data set from an unknown but apparently heavy-tailed dis¬ 
tribution, the practitioner does not know if the distribution comes from a 
truncated or a non-truncated Pa-type distribution and hence we have to 
study the behaviour of the tests and the proposed estimators under both 
cases, and compare them with the existing estimators where appropriate. 
Our simulation results illustrate this, using three columns in Figures 5-7, 
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setting the cases where T equals the 90 and 99 percentile of the correspond¬ 
ing non-truncated Pa random variable W next to the case where T = oo. 
The case T = (5w(0.90) serves as an illustration of the rough truncation case, 
while T = QwiO-QQ) is meant to represent light truncation. 
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Pareto(a = 0.5), n = 500, runs = 1000 
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Pareto(a = 0.5), n = 500, runs = 1000 



Trunc_Pareto(a = 0.5, T = 100), n = 500, runs =1000 


Trunc_Pareto(a = 0.5, T = 10000), n = 500, runs = 1000 
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Trunc_Pareto(a = 0.5, T = 100), n = 500, runs = 1000 
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Figure 5: Pa(Q; = 0.5): Left column: T = Qpp^(0.90); Middle column: T = Qpp^(0.99); Right column: T = 
oo. Means of p-values of tests based on Ta and Tg (first row). Estimatio n of a using the Newton-Raphson 
procedure with initial value mean (second row) and \/MSE (third row). Estimation of 

the high quantile Q' 0.002 using ^Joo2,/c,n’ ^^oo^/c,n’ ^ 0 ^ 002 , fc,n column): means (fourth row) and 

\/MSE (fifth row). 
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The mean of the p-values show that both tests Ta and Tb strongly reject the 
hypotheses and in case T = (5w(0.90). In the case T = (5m(0.99) 
test Ta rejects more clearly in contrast to the test Tb which then appears 
more appropriate to test In case of a Burr distribution for W with 

T = Qw{0-99), Tb rejects more often for k > 100 as the deviation from 

the Pa-model begins to set in for this Pa-type distribution W. 


The a-estimator a represented in Figures 5-7 is the solution of (|^, approx¬ 
imated using the Newton-Raphson iteration as in (|^, with an initial value 
= 1/Hk^n- With finite samples and fixed T, TPa-type distributions be¬ 
long to the Weibull domain of attraction for maxima with EVI ^ = — 1 so 
that the moment estimator in (32) almost surely converges to -1. Also, for 
these models, 1/Hk^n does not constitute a consistent estimator either for 
a or for since in case ^ < 0 the Hill estimator Hk^n almost surely tends 
to zero when k/n ^ 0 as k,n ^ oo. Only when T = oo we have that 
Okn and 1/Hk,n estimate the same value 1/^. In case of the truncated Burr 
distribution the Pareto index estimator is underestimating a, which is 
not uncommon in extreme value analysis, and is in fact comparable to the 
behaviour of l/i7fc,n in this case. 


When estimating an extreme quantile the estimator in (33) based on the 


moment estimator is designed both for truncated and non-truncated cases 
and is to be compared with the estimation procedure defined in (23). Finally 


qpkn ^^d the Weissman (1978) extreme quantile estimator q^kn from (24) 
are competitors in case of non-truncated Pa-type distributions only. 

The convergence of the new quantile estimators seems to be attained at low 
thresholds (or high k) with high accuracy, contrasting with higher thresholds 
(or low k) for MOM class estimators. With quantile estimation an erratic 
behaviour appears under rough truncation in Figures 6-7 for some smaller 
values of k. This is a consequence of the use of = max{.Dr,0} rather 
than Dt in practice. If we assume that T is finite then using simply Dt 
rather than produces much smoother performance in extreme quantile 
estimation. On the other hand in case of non-truncated models the use of 
Dt instead of D^\ leads to extreme quantile estimates that are quite sensi¬ 
tive with respect to the value of Dt- While the stable parts in the plot s of 


quantile estimates are readily apparent anyway, we here use in (|21[). 


In case of non-truncated Pa-type models (right columns) concerning high 
quantile estimation, taking into account that q^^^ and q^ are designed 
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Trunc_Pareto(a = 2, T = 3.1623), n = 500, runs = 1000 



0 100 200 300 400 500 

k 

Trunc_Pareto(a = 2, T = 10), n = 500, runs = 1000 


- MOM estimator 


100 200 300 400 500 

k 

Pareto(a = 2), n = 500, runs = 1000 


100 200 300 400 500 



Figure 6: Pa(Q; = 2): Left column: T = Qvi^(0.90); Middle column: T = Qp^(0.99); Right column: T = oo. 
Means of p-values of tests based on and Tg (first row). Estimation of a using the Newton-Raphson 
procedure with initial value mean (second row) and a/MSE (third row). Estimation of 

the high quantile go .002 using 9(^002,fc,n’ 'Ji^oolffc.n’ 9aoo2,fc,ii column): means (fourth row) and 
VMSE (fifth row). 


for this particular situation, we can conclude that the newly proposed es¬ 
timators perform reasonably well at p around 1/n if we compare with the 
classical Weissman and moment-type extreme quantile estimators. For in¬ 
stance in case of the Burr distribution in Figure 7 it appears that our quantile 
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estimator is slightly worse than the moment estimator but better than the 
Weissman (1978) estimator. Finally note that for quantile estimators the 
relative \/MSE values can be obtained dividing the presented absolute error 
\/MSE by the exact value of Q{1 — p). 
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Trunc_Burr(a = 2, p = -1, T = 3), n = 500, runs = 1000 Trunc_Burr(a = 2, p = -1, T = 9.9499), n = 500, runs = 1000 




Burr(o = 2, p = -1), n = 500, runs = 1000 



Trunc_Burr(a = 2, p = -1, T = 3), n = 500, runs = 1000 Trunc_Burr(a = 2, p = -1, T = 9.9499), n = 500, runs = 1000 
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Figure 7: Burr(Q: = 2, p = —1): Left column: T = Qv^/(0.90); Middle column: T = Qp^(0.99); Right 
column: T = oo. Means of p-values of tests based on Ta and Tb (first row). Estimation of a using the 
Newton-Raphson procedure with initial value mean (second row) and VMSE (third row). 

Estimation of the high quantile <70.002 using < 7 ^ 002 , fc,n’ ^^002ffc,n’ ^0d)02,fc,n 0 -^^^ column): means 

(fourth row) and \/MSE (fifth row). 
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6 Asymptotics of estimators and tests 


In this section we state the large sample distribution of and q. 


%k,n defined 


in and (21) for TPa-type distributions both under rough (12) and light 
( [l3 ) truncation. To this end we will make use of the expressions (15)-(14) 
and (16) for the upper quantile function Qt{1—p) both under rough and light 
truncation assuming that Ft is continuous. We also make use of a second 
order slow variation condition on lu specifying the rate of convergence of 
£u{tx)/iuix) to 1 as X (X), which is used typically in all asymptotic results 
in extreme value methods (see for instance Theorem 3.2.5 in de Haan and 
Ferreira, 2006): 

1 , £ij(tx) 


b[x) 


log 


iu{x) 


= hp{t) 


(39) 


with p < 0, hp{t) = {F — l)/p, and b regularly varying with index p, i.e. 
b{tx)/b{x) F as X ^ oo for every t > 0. Finally TV represents a standard 
Wiener process. 

Theorem 1. Let ( p^ hold and let n,k = kn ^ oo, k/n ^ 0, T ^ oo. Then 
(a) if k/ (jiDt) ^ k G (0, oo), 

il&l,, -1/« = ^ + Ki/rn-U))/?,) (1 + op(i)), 

with asymptotic variance l/{k5^a^) and 

Add _ _ 1 log (1 + k)| — y* W {u)dlog {1 + ku) , 


/5« 

B. 


— A — E r 


hp{[l + Ku] ^)du - hp{[l + k] ), 


— hp{[l + k ] ^), 

1 _ (1 + k) 

K 


Ck. 




log (1 + k) , 


= 1 - log^ (1 + «); 

Kj 


(b) if k/{nDT) oo and Dr log(fc/(nDr)) = o{{n/k) ^■'‘^) 

l/aj„ - 1/a = (^VW + Hnim - p)-') (1 + „,(!)), 
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with = W(l) — fJ(W(u)/u) du ~ A/'(0,1). 


Remark 1. In case k/{nDT) —>• oo the asymptotic result for ^ is 
identical to that of the Hill estimator Hk^n under a Pa-type distribution, as 
given for instance in Beirlant et al. (2004), section 4.2. 


Theorem 2. Suppose (39) holds and n,k = kn ^ oo, fc/n ^ 0, T —>■ oo and 
p = Pn such that npn = o{k). Moreover E denotes a standard exponential 
random variable. 


(a) Let k/{nDT) ^ K. Then 



Q;(/c -|- 1) 

+Kb{l/Fw{T)) 


k + 1 


(1 + Op(l)). 


(b) Letnpn —>■ oo, log(npn) = o{'/k), s/knDT/ \og{k/npn) 0, and h{n/k) log{k/(npn) —t 

0. Then 


log Qp,k,n - log Qp 

= log {k/{npn)) + b{n/k){l - p)“^| (1 + Op(l)) 


1 1 


anpn 
If npn —>■ c > 0 then 


{E -1 + Op(l)). 


loggp,fc,n -\ogqp = -^{E - l){l + Op(l)). 

Remark 2. In case k/{nDT) —> k, both the asymptotic bias and the stochas¬ 
tic part of Qp k n ^re of smaller order than in case of 1 /q;^„. This confirms 
the findings of the simulations where the plots of the quantile estimators are 
found to be quite horizontal as a function of k and show a small variance, 
compared to other extreme quantile estimators in this case. 
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In case k/{nDT) —> oo, note that the quantile estimator 

sistent if {npn)~^ 0, this is for quantiles qp situated maximally up to the 

border of the sample g„-i, using for instance a sequence of the type = 

{logky/n for some r > 0. The extra factor ^(1 + in 

(23) compared to the Weissman estimator induces this restriction. The 
first term in the expansion of log ^ in Theorem 2(b) is indeed the asymp¬ 
totic expansion of ^ as given for instance in Beirlant et al. (2004), section 

4.6. If npn oo and iog((fc+iV((n+i)p )) ^ then the expansion of the 

Weissman estimator is dominant, while if np^ —?• oo and \o^/{np )) 
the second term in the expansion is to be retained. 


From Theorem 2 it follows that it is important to be able to test if for a 
given case study rough or light truncation holds. Indeed, if rough truncation 


holds then in extreme quantile estimation g^j.^ from (|21|) should be used, 
while under light truncation the estimator g^j.^ 


% 


MOM 


from (25), or the classical 


Weissman estimator (24), or the moment type extreme quantile estimator 


p 


can be used when extrapolating outside the sample. From Theorem 


3(a) the consistency of both tests (27) and (31) follows directly, while the 
null distributions are conisdered in parts (b) and (c). 


Theorem 3. Suppose (39) holds and n,k — kn ^ oo, k/n —>■ 0, T —>■ oo. 

(a) Let k/( jiDt) ^ n. Then 


log Rk,-. 

Hi 


k.n 


Lk,n(yHj 


k.n) 






log(l + k) 
log(l -\- k) — K 

_K_ 2/-C —log(l-|-/^) 

{K — l0g(l -|- «)}(! -|- K) K-log(l + K) {(1 -|- k) K-log(l + K) 


1 } 


k{2k — log(l -|- k)) 


(b) Let hDt —t 0. Then 


TA,k,n ~^d E 


where E is a standard exponential random variable, 

(c) Let nDT/k 0 and \/k{n/ky = o(l). Then 


TB,k,n -^d A/'(0,1). 
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Note that the result in Theorem 3(b) needs a stronger condition on T for 
the limit E to hold under Hq , meaning that for the test based on Ta the 
truncation point T has to lie higher than in case of Tg in order to keep a given 
significance level. This yields a theoretical confirmation of the simulation 
results where the Ta test was found to reject light truncation situations that 
deviate from the untruncated Pa-type distributions, sooner than the Tb test. 

7 Conclusion 

We have extended the work on estimating the Pareto index a under trun¬ 
cation from Aban et al. (2006) and Nuyts (2010) to extreme quantile esti¬ 
mation, and considered also truncation of regularly varying tails. The main 
proposals and findings are 

• The new estimator of the Pareto index a is effective whether the un¬ 
derlying distribution is truncated or not, thus unifying previous ap¬ 
proaches. Although based on a truncated model, the estimator of a is 
competitive even when the underlying distribution is unbounded. 

• Our method leads to new quantile estimators which are especially ef¬ 
fective in the case of rough truncation. In case the data come from 
a light truncated Pa-type distribution, which is the case when the Pa 
QQ-plot E3 is linear in the right tail, the extreme quantile estima¬ 
tor should not be used for extrapolation far out of range of the 
available observations as discussed in Remark 2. 

• A new TPa QQ-plot is constructed that can assist in verifying the 
validity of the TPa-type model. Moreover a new test is provided for 
testing light truncation against rough truncation which offers an extra 
tool for practitioners. 

Acknowledgment The authors thank Mark Meerschaert for helpful discus¬ 
sions and many suggestions that had a significant positive influence on the 
paper. 
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Tail fitting for truncated and non-truncated 
Pareto-type distributions 


Supplementary material: Proofs of the asymptotic results 


Beirlant Fraga Alves, Gomes, 

Department of Mathematics and Leuven Statistics Research Center, KU Leuven 
^ Department of Statistics and Operations Research, University of Lisbon 


Proof of Theorem 1 The mean value theorem implies that 1/a^^ — 1/a = 
—f{l/a)/f'{l/a) where a = is between a and with /(^) = ^ ~ 

l/a— log Then that the limit distribution of 1/d^^ is found from the 

asymptotic distribution of 

1 ~ ct —^- ~ m ^ — 1 /ct-^--- 

V ) \ 

Hence the asymptotic behaviour of Hk^n log Rk,n constitute essential building 
blocks in the derivation of the asymptotics for We consider these in the 

following Propositions. 

For this we make use of the result (see de Haan and Ferreira, 2006, 7.2.12) that 
for some standard Wiener process W (with E{yV{s)yV{t)) = mm{sR)) we have 
uniformly over all j = 1,..., /c, as k^n ^ oc, k/n 0 



Vk 


— U- 

k^J^n 



-^p 0 . 


(SI) 


Proposition 1. Let (39) hold and let n^k = kn ^ oo, k/n 0, T ^ oo. Then 
(a) ifk/{nDT)^n, 


Hk^r 


a 


+ 


k/(nDT) \ 

1 /W{l)k/{nDT) 


k 


nDj' 


a\/k V 1 

+h{l/Fw{T))Ak^n,Ti}- + Op(l)), 


f W(?/)(i log(l 

Jo 
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where 


^k,n,T 

(b) if k/{nDT) ^ oo, 




nt.„ = i + ^fwa)-/'^<i«)(i+op(i)) 

aVk V Jo ^ J 

-\-b{n/k) [ hp{u~^)du{l-\-Op{l)) 

Jo 

1 tiDt 1 / k .. , .. 


where the first two terms in this expansion are the limits for k/{nDT) oc 
of the first two lines in the expansion in ease (a). 


Proof Let jV = j—r-\-l and let Ui^n ^ U2,n < • • • < Un,n denote the order statistics 
from an i.i.d. sample of size n from the uniform (0,1) distribution. Then using 
summation by parts and the fact that Xn-j^i^n —d Qr(l ~ ^j,n) (j = 1? • • • ? 


1 ^ A 

Hk^n ~ V ^ ^ J j+l,n j,n) 


J=1 


k 

= (logQT(l - ^7,•+!,„) - logQr(l - U^^n)) 


J=1 

k -\-l 


,5 * +1 k 


dlogQr(l - w^)- 


Using (SI), Hk^n can now be approximated as A:, n —)■ oo by the integral 


“Uo ” 


^n+yV(n)/\/fc+l/fc ^ 1 

/ (ilog(5T(l-re) > 

/n+yV’(n)/vT ^ J 


Using the mean value theorem on the inner integral between u-\-yV{u) j\fk and ?i + 
W(?i)/A/fc + 1/A^, followed by an integration by parts, we obtain the approximation 



(S2) 


2 














First, let k/{nDT) Then from (15) the approximation (S2) of ^ equals 


— log I 1 + 
a \ tiVt 


Dt\ Vk JJ 


logiu (^{l/Fw{T)) 
- [\og(l+ ^ 

Jo \ 


1+ ‘ (i+mv 

tiDt V ^/k J 


-V 


a 


nDj' 


u + 


W{u] 


\/k 


+ f^^ogeu Ul/Fw{T)) 


1 + 


TiDj' 


du 


u + 


W{u) 

Vk 


-F 


du. 


Next, add and subtract log iu{l/Fw{T)) from the second and fourth line re¬ 
spectively, and use the approximations 

k 

, k- \ vy[ii.\ — 

= log 1 + 


log 1 + 


k 


u + 


W{u) 


^ u] + 


TiDj' 


nDj' 


nDx V Vk 

with 0 < « < 1 and u* between u and u + W{u)/\/k, and 

fc((l/F„,(r))[l + 

log- 


Vk 1 + 


k 


uDt 


iuO/FwiT)) 

= b{l/Fw{T))hJ[l + 


k 


^ nDj' 

Finally, using partial integration we have 


u 


]-M(i + o,(i)). 


I 


^ / k \ k 

log ( 1 H-— 1 / ] du = —1 F log(l -h 


tiDt t k 

+ ^^log(l + 


k 


jiDj' ^ 


Jo \ TiDj' J nDj^ 

from which one obtains the stated result in (a). 

Secondly, consider k/inDj) oo. Then using (16) the proof follows the lines of 
proof of the asymptotic normality of the Hill estimator as for instance given in 
Mason and Turova (1994), except for the extra term 


1 1 


1+^ 


J — 1 ^fc + l,n 

appearing from the factor [1 -h yDj]~^^^ in (16). However, using f/j>/(j/(^ + 
1)) 1, one derives that 


a k 


nDi 
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Condition DT\og{k/{ uDt)) — o{{n/k)~^^P) in Theorem 1(b) entails that the bias 
term due to the factor (1 + DTy)~^^^ in (16) is negligible with respect to the 
classical bias h{n/k)/{l — p) due to the last factor in (16). □ 


Proposition 2. Let (39) hold and let n^k = kn ^ oo, k/n ^ 0, T oo. Then 
(a) if k/{nDT) ^ n, 


log Rk^n 


where 


--log (l + - 

a \ uDt J aVk 
+b{l/Fw{T))Bk,n,Ta + o(l)), 


k j (^uDt) 

1 + k/{nDT) 


)(1 + Op(l)) 


Bk,n,T ^ hp{[l + {k/{nDT))] ^), 


(b) ifk/{nDT) —> oo, 


log Rk,n 


1 1 11 + k 

-log A:- {Ek,k - log k) + - log -- 

a a a 14- 

k 

-b{n/k) + hp {[k~^ + {nDT/k)]~^(j (1 + o(l)), 


where denotes the maximum of a sample of size k from the standard 

exponential distribution. 


Proof The proof of (a) follows similar lines as the proof of Proposition 1(a). 
Concerning part (b) remark that using (16) 


log Rk,n 


1 1 Uin 
-log-- 


1 1 + +^ 

a 1 + ^k 


+ log 


ty (CT(«/t)[l + 


The result then follows since log E^^k^ ^md by using (39). □ 

Proof of Theorem 1 [eonRd). First we derive the consistency of under the 
conditions of Theorem 1, so that then a -^p a. Aban et al. (2006, see A.4) showed 

that f{t) •= j + ~ Hk,n is a decreasing function in t G (0, oc). Moreover 

^k,n 

limt^oo/(A) = -Hk,n < 0 and limj^o/(A) = -(log-Rfc,n)/2 - Hk^n- Showing that 


4 



asymptotically under the conditions of the theorem —{\og Rk^n)—Hk^n > 0 using 
Propositions!^ andin both cases (a) and (b), we have then that there is a unique 
solution to the equation f{t) — 0. Note with Propositions and that for the 

true value a we have fia) — 0 ^( 1 ), since ^ asymptotically 

are equal, namely to ^ log(l + in case (a), and a~^ in case 

(b). So the true value a asymptotically is a solution from which the consistency 
follows. 


Now using Propositions [T] and we obtain that 


where 


^ ~ )2 = + Op(l)), 


X T ^ ^ 1 2 A , ^ 

4,n,T = 1- , j, log 1 I + 




Next, consider g{Hk,n,logRk,n) = Hk,n -l/a- with 

^k,n 


1 e^y 

g{x,y) = x - y- 


a 1 — e^y 


(S3) 


The Taylor approximation of g{Hk^n^logRk^ri) around the asymptotic expectated 
value EooHk^n and E’oo log yields 


di^oo^k^ni Eqo log Rk^n) T {Hk,n 

Og 

T (log 77, £/oo log-RA:,n) -^oo log-Rfc,r 


with, based on Proposition!^ 
dg 


o {EqqHj^ Eqq log Rj^ ji) — c/;, 77 , 7^(1 + o(l)), 

oy 


(S4) 


(S5) 


where 


, k 1 / (1 + 


log ( 1 + 


k 


(uDt^ nZ^T, ^ 

From Propositions andj^ ( |S4[ ) , ( |S5[ ) , and ( |S3[ ) we find that the stochastic part 
in the development of is given by 




^ (- i‘ (' + + >V(1) t 
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Developing for k/{nDT) respectively k/{nDT) oc, leads to the stated 

asymptotic variances in cases (a) and (b). 


From (S4), (S5), and the asymptotic bias expressions in Propositions and one 
finds the asymptotic bias expressions of For instance in case k/{nDT) 

Hi we find that 


oo log Rk,n) 

= 6(1/ Fw{T)) {Ak^n,T — Bk,n,TCk,n,T) (1 + o(l)). 


(S6) 

□ 


Proof of Theorem 2. First consider the case k/{nDT) —>■ n- Then from (22) 


log^L.n = log Xn-k,n + 


a 


log 


1 - 


k+1 


k.n 


TD^k,n 1 

-^k,n Ac+1 


a 


^ log ( 1 + 


k,n 


Dn 


1 + 
= log Xn,n - Y~ 


a 


k,n 


1 - 


k+l 


1 


a 


log 


k,n 


^ p(n + l) k + l \ 
k + l (n + 1)Dt j 


while 


logW,n = l0gl7T(l/f^l,n) 


1 


1^1,/, ^ + 1 El 


+ log 


iu [{l/FwiT))[l 


+ 


Ul,n 1—1 

Dt J 


£u (l/^w(r)) 


with F’n+i the average of an i.i.d. sample F’l,..., £'^+1 from the standard 

exponential distribution so that Ui^n —d EijEn^i. Furthermore 

logl7T(l/p) = logT- -log b + ~^ l'b (l + 0 ( 1 )) 

a V k + l 


+ log 


iu ((l/Fm(T))[l + -t^K 


p(n+l) , 
k+l 


( 1 + <,(!))]-■) 


iu {l/Fw{T)) 
Now it follows from Propositions 1(a) and 2(a) that 


TD k,n 

^k,n 


1+^-1 and 

k + l 
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so that, with 


hp ^[1 + ^ + Op(l))] 

^p([l + ^|^«(l + Op(l))]-') 


- —El(l + Op(l)), 

(1 + Op(l))) 


I0g5^fc,n-I0g<5(l-P) 


from which (a) follows. 


+ - 


a k + 1 
K,p{n + 1) 


at 


1 


a k + 1 


k + 1 
- ^b{l/Fw{T)) 


p{n + 1 ) 

A; + 1 

El p{n + 1 ) 


A; + 1 A; + 1 


(1 + Op(l)), 


Next, in the case k/{nDT) oo, starting from expression (23) and Proposition 
1(b), we obtain 


log qp,k,n = log Xn-k,n+ log 1 + 


at 


(n + 1)Dt 
k + l 


a 




A; + 1 


A; + 1 


where under the given conditions 


(« +1)% 
k + l 


jEk^n _ 1 

-^k,n k+l 

- T 
af 

1 - RkT 


= (^e-[£;fe.fc-log(fc+i)](1 + Op(l)) - l) 

1 


{E{l + Op{l))-l), 


k + l 

since, using Theorem 1(b) and Proposition 2(b), and the fact that exp{—[Ek^k ~ 
\og{k + 1)]) asymptotically is standard exponentially distributed, 

(A: + l)i?jy = (k + l)-(+-^+xp(-^{Ek,k-log{k + l))^ 


a 



V i + ) 


X exp (^-al+{n/k) 

(n + 1)Dt 

k + l 




= E {I + Op{l)) - 1. 
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Furthermore, 


1 ^ 1 ^ ^1 in + 'i-)Uk+i,n 1 , in + l)p 

^OgXn-k,n-^OgQ{l-p) = -log-——-- + - log 

a k + 1 a k + 1 

1/1 (^ + l)F^r k +1 . 1 / Dt 

— log(l H -——j —-— . - ) H— log I 1 H - 

A; +1 (n + a \ p 


+ log ( Cr 


n + 1 


A; + 1 


A; + 1 (n + 

Dt.-i 


■[1 + 


{n + l)DT ._i 
k + 1 ^ 


— logluiCrP [1 H- 


p 


Finally since under the given assumptions (n + l)Uk^i^ri/{k + 1) -^p 1, Dt/p 0, 
and 

(Ctj^ -^ 


log 


k-\-l 


_ j-j (n+li^T l-l 


k+1 


+ (CtP-+ + ^+ 


= 0{b{n/k)), 


we find that 


log Qp,k,n - log Q(1 - P) = F 


^ T 

at a 

. k.n 


- - 1 log( 


a 


k,n 


log 1 + 


in+l)p ^ 

k + 1 ' 

E-1 
(n + l)p 


(1 + Op(l)), 


from which the result follows with Theorem 1(b). □ 

Proof of Theorem 3. The first statement in (a) and (b) follow readily from 
Propositions 1(b) and 2(b). 


In order to prove (c) we first derive the asymptotic distribution of ^/k {Ep^n{l/— 1/2). 
Note that 


Vk iEk,n{l/Hk,n) - 1 / 2 ) 


Vk ^Ek, 

+\/k ( 


1 






- 1/2 


^k,n{'^/Hk^n) + 


k^n 


‘^{Hk,n + Cl 


-P 
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with 


r 


k,n 




(Ek n{^) ~ ^j I ~/ ) 7 5 > 0, 

\ ’ 1 + s/a J 


In Theorem A.l in Beirlant et al (2009) [31] it is stated that converges 

weakly in the space M x C[0, sq] with sq > 0 and C[a, b] the Banach space of contin¬ 
uous functions / : [a, 6] ^ M equipped with the topolgy of uniform convergence. 
The limit process (T, E) is a Gaussian process with 


Var{r) 

(:7o^(r,E(5)) 


a Coi;(E(si),E(52)) 

—sjo? 

{XEs/aY 


S\S2lo? 

(1 + 5i/(a + S 2 /a){l + si/a){l + 82 / 0 ) 


From Hk^n = + <^^(1) we have for sq > (a that P(so > 1/Hk^n ^ 0) ^ 1 

as n ^ oc, and thus by the continuous mapping theorem Hk^n)) 

converges weakly to (r,E((a)). From this it then follows that 


IEA:,n(l/ Hk^n) + 


^k^n 

+ O'-^) 


^ AT (0,1/48). 


Then also 

1 ~ Ek^ni^/Hk^n) -^p 1 / 2 , 
from which the result (c) follows. 


To prove the second statement in (a), we obtain from (15) that 


Ek,n {^/— 


1 /Qr(l ~ Up-^i^n, 

Qril-Uj^n) 


k 

It. 

i=i 


1 + 


Uj.n Uk+l,u \ 
Uk-\-l,n Dt 


1 + 


C^fc + 1,7 
Dj^ 




Dj' 




1 [1 I t/j,n ^fc + l,n 1-l> 

. ^^Fw{T) ^ 


Uk + l,n Dt 


Setting Vj^k Uj^ri/UkFi,n (1 < j < k) we have that Vi^k^ • • • ^Vk^k are dis¬ 
tributed as the order statistics of a uniform (0,1) sample of size k. Moreover 


9 















for UkJ^i^ri/D t i^k,n we have K,k,n ~^p Following Proposition 1(a) it then 

follows that 

Hk^n -^p ~ f 1-log(l + 5 


a \ Hi 


Upn Uk + l,n\ y(^^k,n) 


1 + _ 

Uk-\-l,n Dt 


1 + 


Dj^ 


1 + 

1 “ 1 “ n 


'fj,( 1 [1 I ^j,n ^fc+l,n 1-l\\ 

^ Uk+l,n Dt i M 


~ 1 - jPb{l/Fw(T)) (h,([l + KV,_t]-') - M[1 + «l“‘)) 

-^k,n 

Since 

k ^ V 1 + K. ) 

k/{K-\ og(l + K,)) 


J=1 


1 + K,U 

Jo V 1 + 


du 

+ Hi 

(1 + Hi){K. - log(l + Hi)) 
k{2k — log(l + k)) 


2k —log(l + K) 

1 - (1 + k)~ -iog(i+-) ) e (0,1/2), 


and 


I ^(1 + (^p([l + ^k,nV,,k]-^) - hp{[l + Kk,n]-^)) = Op{l), 

J = 1 

the result follows. 
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